AITopics | upper confidence primal-dual reinforcement learning

Collaborating Authors

upper confidence primal-dual reinforcement learning

Information about AI from the News, Publications, and Conferences

Automatic Classification – Tagging and Summarization – Customizable Filtering and Analysis

If you are looking for an answer to the question What is Artificial Intelligence? and you only have a minute, then here's the definition the Association for the Advancement of Artificial Intelligence offers on its home page: "the scientific understanding of the mechanisms underlying thought and intelligent behavior and their embodiment in machines."

However, if you are fortunate enough to have more than a minute, then please get ready to embark upon an exciting journey exploring AI (but beware, it could last a lifetime) …

Upper Confidence Primal-Dual Reinforcement Learning for CMDP with Adversarial Loss

Neural Information Processing SystemsDec-24-2025, 10:53:26 GMT

We consider online learning for episodic stochastically constrained Markov decision processes (CMDP), which plays a central role in ensuring the safety of reinforcement learning. Here the loss function can vary arbitrarily across the episodes, whereas both the loss received and the budget consumption are revealed at the end of each episode. Previous works solve this problem under the restrictive assumption that the transition model of the MDP is known a priori and establish regret bounds that depend polynomially on the cardinalities of the state space $\mathcal{S}$ and the action space $\mathcal{A}$. In this work, we propose a new \emph{upper confidence primal-dual} algorithm, which only requires the trajectories sampled from the transition model. In particular, we prove that the proposed algorithm achieves $\widetilde{\mathcal{O}}(L|\mathcal{S}|\sqrt{|\mathcal{A}|T})$ upper bounds of both the regret and the constraint violation, where $L$ is the length of each episode. Our analysis incorporates a new high-probability drift analysis of Lagrange multiplier processes into the celebrated regret analysis of upper confidence reinforcement learning, which demonstrates the power of ``optimism in the face of uncertainty'' in constrained online learning.

mathcal, name change, upper confidence primal-dual reinforcement learning, (5 more...)

Neural Information Processing Systems

Technology: Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (0.53)

Add feedback

Review for NeurIPS paper: Upper Confidence Primal-Dual Reinforcement Learning for CMDP with Adversarial Loss

Neural Information Processing SystemsJan-27-2025, 14:50:12 GMT

Weaknesses: (W1): As such the high-level outline of the proof strategy follows previous procedures for drift analysis in (Yu et al. 2017) and MDP analysis in (Neu et al. 2012 and Rosenberg et al. 2019). Lemma B.2 is very similar to Lemma 4 in Neu et al. 2012 and Lemma B.2 in Rosenberg et al. 2019. Lemma 5.2 mirrors Lemma 8 in Yu et al. 2017. Technical lemmas for stochastic analysis are also from the previous paper: (Lemma B.6 and B.7 are Lemma 5 and 9 in Yu et al. 2017). The main lemma, Lemma 5.3, has the same goal as Lemma 7 in Yu et al. 2017, which is to show Q_t satisfies the drift condition stated in Lemma 5 in Yu et al. 2017. Lemma 5.6 is also exact as Lemma 3 in Yu et al. 2017.

budget violation, lemma 5, upper confidence primal-dual reinforcement learning, (12 more...)

Neural Information Processing Systems

Technology: Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (0.40)

Add feedback

Review for NeurIPS paper: Upper Confidence Primal-Dual Reinforcement Learning for CMDP with Adversarial Loss

Neural Information Processing SystemsJan-27-2025, 14:50:05 GMT

I want to thank the authors for preparing the detailed rebuttal. This paper was discussed among all the reviewers during the post-rebuttal discussion phase. Overall, the reviewers are excited about this work on solving constrained MDP problems and have a positive assessment of the paper. All the reviewers acknowledged the theoretical contributions, especially in a challenging setting with unknown dynamics and non-stationary loss function. There was a clear consensus that the paper should be accepted.

adversarial loss, reviewer, upper confidence primal-dual reinforcement learning, (2 more...)

Neural Information Processing Systems

Technology: Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (0.40)

Add feedback

Upper Confidence Primal-Dual Reinforcement Learning for CMDP with Adversarial Loss

Neural Information Processing SystemsOct-11-2024, 02:57:02 GMT

We consider online learning for episodic stochastically constrained Markov decision processes (CMDP), which plays a central role in ensuring the safety of reinforcement learning. Here the loss function can vary arbitrarily across the episodes, whereas both the loss received and the budget consumption are revealed at the end of each episode. Previous works solve this problem under the restrictive assumption that the transition model of the MDP is known a priori and establish regret bounds that depend polynomially on the cardinalities of the state space \mathcal{S} and the action space \mathcal{A} . In this work, we propose a new \emph{upper confidence primal-dual} algorithm, which only requires the trajectories sampled from the transition model. In particular, we prove that the proposed algorithm achieves \widetilde{\mathcal{O}}(L \mathcal{S} \sqrt{ \mathcal{A} T}) upper bounds of both the regret and the constraint violation, where L is the length of each episode.

adversarial loss, mathcal, upper confidence primal-dual reinforcement learning, (3 more...)

Neural Information Processing Systems

Technology: Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (0.69)

Add feedback